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1.  Introduction 


The  quasicontinuum  (QC)  method  is  commonly  employed  to  solve  a  wide  variety  of  multiscale 
problems  (1-6).  Typically,  problems  of  interest  are  those  involving  a  small  number  of  defects 
such  as  dislocations,  a  crack,  or  grain  boundary  interactions  (7).  Unconstrained  optimization 
constitutes  the  key  computational  kernel  of  this  method  (8-11).  In  this  research,  the  objective  is 
to  study  minimization  solvers  in  order  to  optimize  computational  performance.  Due  to  the 
implicit  solution  nature  of  the  QC  method  and  the  predominant  nonconvex  nature  of  the  potential 
energy  surface  with  vast  numbers  of  metastable  configurations,  one  needs  to  use  an  effective  and 
efficient  iterative  minimization  solution  technique.  The  efficiency  of  the  techniques  depends  on 
the  time  needed  to  evaluate  the  energy  expression  and  the  number  of  iterations  needed  to 
converge  to  the  minimum.  In  this  context,  a  minimization  iteration  is  defined  as  an  iteration 
when  the  direction  vector  is  updated.  Iterations  should  not  be  confused  with  function  evaluations 
nor  the  iterations  involved  with  the  linear  equations  solvers  within  the  minimization  framework. 
In  this  report,  iterative  solution  techniques  are  explored  for  the  local/nonlocal  variant  of  the  QC 
method  (4)  for  the  two-dimensional  (2-D)  situations.  The  fully  nonlocal  QC  method  with 
variable  clusters  (12)  will  be  employed  in  a  companion  report  for  three-dimensional  (3-D) 
situations. 

Three  popularly  used  unconstrained  optimizing  methods  are  steepest  descent  (SD),  nonlinear 
conjugate  gradient  (NCG),  and  Newton-Raphson  (NR)  methods.  Compared  to  the  SD  method 
(13),  NCG  methods  (8,  14,  15)  and  the  NR  method  (16)  converge  considerably  faster.  Compared 
to  the  NCG  method,  the  NR  method  converges  even  faster  as  it  employs  the  curvature  of  the 
objective  function,  in  addition  to  the  gradient  infonnation,  to  predict  a  search  direction  during  the 
minimization  iterations.  However,  due  to  the  use  of  the  second  derivative  of  an  objective 
function  (i.e.,  Hessian  matrix),  the  NR  method  suffers  from  two  drawbacks.  First,  in  each 
iteration,  the  search  direction  is  obtained  by  solving  the  Newton  equations,  which  have  GO'3) 
computational  complexity  when  employing  factorization-based  direct  solvers.  Secondly,  the 
Hessian  is  a  dense  matrix  where  the  storage  complexity  is  0(Ar2).  Alternatively,  variants  of  the 
Newton  method  are  available  such  as  the  quasi-Newton  (QNR)  and  the  truncated  Newton 
methods  (TNR).  In  TNR,  a  truncated  strategy  is  used  such  that  only  an  approximate  solution  to 
the  Newton  equations  (16-25)  is  provided.  The  basic  idea  behind  the  QNR  method  follows  from 
the  NCG  method  by  employing  the  gradients  information  of  previous  minimization  iterations 
within  the  Newton  framework,  thereby  avoiding  a  Hessian  calculation.  The  TNR  method  differs 
from  the  QNR  method  in  two  aspects.  First,  an  iterative  method  is  employed  for  the  solution  of 
the  line  search  direction.  Second,  the  Hessian  is  not  computed  from  previous  gradient 
information. 
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Among  these  four  minimization  methods,  the  NCG  and  the  NR  methods  are  currently 
implemented  for  the  local/nonlocal  variant  of  the  QC  method  found  at  qcmethod.com  for  2-D 
situations.  In  this  research,  we  examine  the  perfonnance  differences  for  two  additional  solver 
methods.  We  report  the  effectiveness  of  the  TNR  method  with  the  conjugate  gradient  method  for 
truncating  the  search  direction  and  QNR  method  with  low-rank  Hessian  update  strategies  used 
for  computing  the  Hessian  that  are  evaluated  against  the  NR  and  the  NCG  methods.  Results  of 
illustrative  examples  mainly  focus  on  the  number  of  iterations  and  CPU  time  for  the  2-D 
nanoindentation  and  shearing  grain  boundary  problems. 

The  remainder  of  this  report  is  organized  as  follows.  In  section  2,  we  review  the  basic  equations 
and  concepts  related  to  the  local/nonlocal  variant  of  the  QC  fonnulation  for  the  calculation  of 
gradients  and  Hessians.  In  section  3,  the  four  iterative  solution  techniques  considered  in  this 
report  are  overviewed.  Through  these  solvers,  the  objective  is  to  detennine  the  stable 
equilibrium  configurations  of  a  defonning  crystalline  material.  In  section  4,  the  result  findings 
of  the  TNR  method  and  QNR  with  low-rank  Hessian  update  strategy  are  evaluated  against  the 
NR  and  the  NCG  implementation.  Concluding  remarks  in  section  5  follow. 


2.  Quasicontinuum  Method 


The  two  key  components  required  for  the  minimization  process  are  the  gradient  and  Hessian 
computations.  Over  an  energy  landscape,  the  minima  are  identified  as  the  lowest  local  point. 
When  the  state  of  the  system  is  not  co-located  with  a  minimum,  in  a  linear  strategy,  the 
minimum  can  be  sought  by  following  a  combination  of  the  local  direction  of  the  topology  (i.e., 
the  gradient  and  its  local  curvature,  the  Hessian).  The  gradient  and  Hessian  are  directly 
determined  from  the  respective  first  and  second  gradients  of  an  energy  function.  In  this  section, 
the  local/nonlocal  QC  formulation  is  overviewed  as  it  provides  the  energy  function  for  our  work. 
In  section  2.5,  the  gradient  and  Hessian  specific  to  this  function  is  presented.  As  this  report  is  to 
be  self-contained,  only  those  elements  of  the  QC  formulation  vital  to  our  discussion  are 
presented.  For  further  details,  readers  are  referred  to  Shames  and  Dym  ( 4 ). 

2.1  Energy  Framework 

Figure  la  shows  the  computational  framework  of  a  typical  QC  simulation.  In  each  load  step, 
both  the  load-increment  and  resultant  deformation  are  considered  finite  but  small.  Therefore,  a 
Lagrangian  description  is  used  to  fonnulate  the  kinematic  behavior  of  the  crystal.  The  heart  of 
the  QC  method  is  the  formulation  of  the  total  potential  energy  as  an  ensemble  of  a  function 
whose  independent  variables  are  atom  and  finite-element  (FE)  nodal  coordinates.  The  ground 
state  (or  state  of  static  equilibrium)  at  zero  temperature  is  found  by  minimizing  the  total  potential 
energy  or,  equivalently,  finding  the  zero  out-of-balance  force  values  of  the  following: 

min  n(n)  ,  (1) 

116  R3A' 
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Figure  1.  (a)  Iterative  process  of  quasicontinuum  formulation  and  (b)  local  and 

nonlocal  framework:  black-filled  circles  are  interface  atoms  shared  by  local 
and  nonlocal  regions;  gray-filled  circles  are  atoms  selected  as  elemental 
nodes  in  local  region;  white-filled  circles  in  the  local  region  are  formulated 
using  kinematic  constraints. 

where  the  displacement  of  the  representative  atoms  u(X)  =  x  -  X  is  incurred  by  the  incremental 
external  force  (f ext).  Here,  N  is  much  smaller  than  the  total  number  of  degrees  of  freedom  of  the 
whole  ensemble  if  the  ensemble  were  comprised  only  of  atoms,  and  X  and  x  are  the  reference 
and  deformed  atomic  coordinates,  respectively.  The  total  potential  energy  can  be  further 
represented  as: 


n(u)  =  E(u)  -  (feast,  u>,  (2) 

where  E  is  the  strain  energy  of  the  crystal.  As  illustrated  in  figure  lb,  the  crystal  is  divided  into 
two  regions — a  local  (continuum)  region  and  a  nonlocal  (atomistic)  region.  Thus,  the  target 
strain  energy  function  may  be  additively  decomposed  into  local  and  nonlocal  components  as 
follows: 


E{  u) 

—  EXeC  (u(X))  + 

Exi{  MC.,1]  (ujVL(X),uI(X),u^(X)).  (3) 

Here,  £,  A'£,  Tan  cl  V  are  labels  for  the  local  region,  nonlocal  region,  interface  nodes,  and 
transition  area  nodes  (6),  respectively.  It  is  noted  that  X  =  C  n  MC.  and  the  transition  area  nodes 
provide  a  seamless  coupling  between  local  and  nonlocal  regions,  and  u£,  uA  c.  u 2 and  upare  the 
associated  degrees  of  freedom  of  the  regions  £,  NC,  X  and  V,  respectively.  The  first  three,  u£, 
uA  c,  and  u2' constitute  the  different  components  of  u,  while  u^is  derived  from  ur  by  using 
kinematic  constraints.  Ec  is  computed  at  the  element  integration  points  and  /.A  c  at  the  nodes. 
Figure  lb  further  illustrates  that  within  the  nonlocal  (atomistic)  region,  a  fully  atomistic 
description  is  used  to  accurately  depict  the  defect  of  the  crystal.  In  addition,  within  the  local 
(continuum)  region,  only  a  small  number  of  atoms  (representative  -)  are  selected  to  describe  the 
local  material  properties  used  in  the  FE  method.  In  the  end,  it  should  be  remarked  that  the 
partition  of  the  ensemble  is  not  fixed.  An  adaptive  partition  strategy  is  required  to  obtain  an 
accurate  description  of  the  ensemble. 
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2.2  Atomistic  Framework 

In  this  work,  the  energy  is  formulated  from  the  embedded  atom  method  (EAM),  which  takes  the 
atomic  coordinates  as  direct  input  (7  7).  In  the  nonlocal  region,  there  is  a  one-to-one 
correspondence  between  nodes  and  atoms.  Therefore,  the  energy  of  the  nonlocal  region  is 
identical  to  a  fully  atomistic  region.  In  the  local  region,  where  there  is  a  smaller  number  of 
nodes  than  atoms,  the  energy  over  all  atoms  is  determined  by  a  kinematic  approximation  of  atom 
deformation.  By  interpolating  some  of  the  atoms  (i.e.,  those  atoms  that  are  not  nodes),  the  strain 
energy  is  a  locally  constrained  homogeneously  deformed  system.  This  kinematic  approximation 
is  known  as  the  Cauchy-Born  (CB)  rule,  which  postulates  that  when  a  monatomic  crystal  is 
subjected  to  a  small  linear  displacement  of  its  boundary,  then  all  atoms  will  follow  this 
displacement  (shown  in  figure  2). 


Figure  2.  CB  rule  which  transforms  the  reference  lattice  into  homogeneously  deformed 
(unified  deformation  gradient  tensor  F)  lattice. 

Like  other  empirical  atomistic  models,  can  be  approximated  as  the  sum  of  the  energies 
contributed  from  individual  atomic  sites  (summation  rule),  i.e., 

E$£{MCJ}<iuNL(X).xir(X),up(X))=  Y.  £^£(uAri(X),u/(X),up(X)).  (4) 

a6{JV£,I} 


Here,  is  the  site  energy  of  a 

u) 

where 


as  computed  using  the  EAM  potential: 

=  fyiPp)  +  i’a(pa)  +  \  Y  0a(7cg3)  , 


pa=  Y,  fa(ra,p) 
and 

\\ja,(3\\  <  lent  . 


(5) 

(6) 

(V) 
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The  embedding  energy  Pa  is  the  energy  associated  with  placing  an  atom  in  the  electron 
environment  described  by  local  electron  density  Pa,  and  7a,/?  =  lXrt  "  is  the  interatomic 
distance.  The  pair-potential  ^a  describes  the  electrostatic  contributions  of  core  ions,  and  7 cut  is 
the  radius  of  the  cutoff  sphere.  The  cutoff  is  introduced  to  limit  the  number  of  modeled 
pair-wise  interactions,  which  effectively  truncates  longer  range  interactions  and  accelerates  the 
convergence  of  the  sum.  Assuming  that  there  are  mP  neighboring  atoms  inside  the  cutoff  sphere 
of  atom  P,  the  deformed  relative  position  of  atom  i  near  atom  P  is  as  follows: 


r*,  =  X*,  +  u*,  -  X,,  -  U/3  . 


(8) 


If  atom  i  is  located  in  the  transition  or  local  region  (shown  in  figure  2),  U/;  is  formulated  using 
the  following  kinematic  constraint: 

N 

u/3  =  E  UaAa(X^)  .  (9) 

a=l 


Here,  ua  is  the  displacement  of  representative  atom  a,  and  Pa  is  the  FE  shape  function.  It 
follows  that 


7nc 

Jxe{A rc,i] 


(u- 


\rc 


Nnl 


(X),uI(X),uT’(X))=  £  ni3E%L{ 
/?=! 


(10) 


where  nP  is  the  number  of  atoms  represented  by  atom  Since  a  full  atomistic  strategy  is 
employed  to  formulate  the  nonlocal  region,  we  choose  n0  =  . 


2.3  Continuum  Framework 


The  strain  energy  related  to  local  region  /,;xe£(u(x))  is  computed  by  summing  the  potential 
energy  of  all  elements  within  the  local  region  as  follows: 

M 

•Bxez:(u(x))  =  E  Ee  , 

e=l 


where 

E?  =  f  We(Ye(X))dV  . 
JQe 


(11) 


(12) 


In  these  two  formulae,  M  is  the  number  of  elements  within  the  local  region  and  We  is  the  strain 
energy  density  depending  on  Fe,  a  defonnation  gradient  tensor  of  element  e.  Equation  12  is 
derived  in  accordance  with  the  CB  rule.  Implied  in  equations  10  and  12  is  that  the  present 
energy  measures  changes  relative  to  the  equilibrium  lattice  energy  of  the  crystal  such  that  the 
strain  energy  at  zero  deformation  (Fe  =  I)  in  equilibrium  is  zero.  The  elemental  deformation 
gradient  tensor 
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(13) 


Fe  =  —  =  1  +  Vu(X)  (X  6  Qe) 
dX 

describes  the  local  and  homogeneous  stretching  and  rotating  of  element  e.  According  to  the 
principle  of  the  FE  method  (26,  27),  the  displacement  field  of  the  element  is  formulated  by  using 
the  following  relation: 


Nne  ,  i  .  \ 

u(X)=^«xk,  (14) 

k= i 

where  Nne  is  the  number  of  nodes  per  element,  (X)  js  the  shape  function  related  to  elemental 
node  k,  and  uk  is  the  displacement  of  node  k.  Consequently, 


Nne 

F<;(X)  =  /  +  E  V^fc(X)  ®  uk ■> 


k=  1 


(15) 


where  <8>  is  the  tensor  product.  Furthermore,  using  a  specific  numerical  integration  method,  it 
follows  from  equation  1 1  that 


u(X)) 


M 


E  ^e£(Fe)’ 

e=l 


(16) 


which  is  evaluated  at  the  quadrature  points,  and  where  Ve  is  the  total  number  of  atoms  inside 
element  e. 

2.4  Coupling  Atomistic  and  Continuum 

The  interface  between  local  and  nonlocal  regions  is  nontrivial.  For  seamless  joining  between 
atomic  and  finite-element  method  (FEM)  regions,  the  FEM  region  must  have  an  internal  atomic 
structure  such  that:  (1)  the  assumption  that  strain  in  each  element  is  homogeneous  is  reasonable, 
(2)  elements  have  a  scale,  and  (3)  element  comer  nodes  ( ri )  are  positioned  on  one  of  the  atoms. 

For  each  element,  the  work  involved  is  as  follows: 

•  Construct  deformation  gradient  for  this  element. 

•  Pick  a  representative  atom  in  the  element. 

•  Calculate  the  energy  per  atom  using  the  atomistic  model. 

•  Assign  energy  to  the  element. 

The  link  between  atomistic  and  continuum  material  simulations  requires  a  procedure  for 
determining  the  nonlinear  elastic  continuum  response  of  a  material  with  a  particular  atomistic 
representation.  The  continuum  response  should  match  the  results  produced  by  discrete  lattice 
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simulations  at  near  atomistic  length  scales,  while  allowing  more  efficient  simulation  at  larger 
scales  through  the  application  of  the  nonlinear  continuum  theory  (28). 

In  general,  particularly  in  the  presence  of  lattice  defects,  the  deformation  cannot  be  assumed  to 
be  unifonn  at  scales  approaching  lattice  dimensions,  even  for  infinitesimal  deformations.  The 
quasicontinuum  procedure  introduces  degrees  of  freedom  into  the  stored  energy  expressions 
which  modify  the  computed  constitutive  properties  to  give  better  agreement  with  experimentally 
determined  values.  Following  equations  10  and  16,  it  is  derived  that 

M  Nnl  n 

rift(u)  =  Y  Ve£(Fe)+  Y  EpL(  rg,---iyd)  -  Y  no(fext,a,Ua)  ,  (17) 

e=l  (3=1  a=l 

where  na  is  the  weight  function  related  with  representative  atom  a.  In  general,  the  interface 
between  local  and  nonlocal  atoms  does  not  preserve  force  symmetry  because  atoms  in  local 
elements  do  not  feel  nearby  nonlocal  atoms,  though  nonlocal  atoms  do  feel  nearby  local 
elements.  As  a  remedy,  the  so-called  ghost  force  correction  is  employed.  To  this  end,  the 
energy  function  is  rewritten  as  follows: 

n/.(u)=  Y  We£(Fe)  +  Y  -  Y  na(Uxt,a  +  ,  UQ)  ‘ 

e—1  P—1 


2.5  Formulation  of  Gradient  and  Hessian 

In  order  to  obtain  the  equilibrium  configuration  of  the  solid,  unconstrained  minimization  of  the 
potential  energy  n/i(u)  is  required.  Therefore,  the  corresponding  gradient  ^ W du  and  sometimes 
Hessian  d2r\h/dv^  have  to  be  computed. 


2.5.1  Analytic  Formulation  of  Gradient 

Using  the  chain  rule,  it  follows  from  equation  18  that 


^  =  £  Vep(Ff.)|E£  -  "y 
dun  Jri  Suq  ^ 


m‘>  .  dr> 

,=i  c,un 


-  na(fQ  +  f,,)  . 


where  p  —  de/dF  js  the  first  Piola-Kirchhoff  stress  tensor.  Using 


Fe  =  I  +  Y  ’ 

a=  1 


dX 


(19) 


(20) 


where  Xe  is  the  element  centroid,  it  follows  that 


dFe  _  dlPn(Xr) 
<9ua  dX 


(21) 


Similarly, 

=  f^a(XJ5)  -  6a0]\  ,  (22) 

SuQ  p 

where  'W  is  the  Kronecker  Delta.  As  a  result,  the  out-of-balance  nodal  forces  are  computed  by 
the  following: 


7 


So  =  ^  =  E  vcP( Fe)VV-a(Xe)  -  £ 

e=l  /3=1 


M 


JVL 


mp 

E  ^V>a(Xp 


7=1 


Tnp 


+  E  Vft  -  "a  (fa  +  f«)’ 


(23) 


7=1 


where 


p 


an 

aF  is  the  first  Piola-Kirchhoff  stress  tensor  and  Xe  is  the  element  centroid. 


2.5.2  Analytic  Formulation  of  Hessian 


Based  on  the  gradient  in  (1 7),  it  can  be  further  obtained  that 


M 


Nun  i 


m~f  m1 


E  £  K“Vte(x*)^(xj' ) 


=  £  ueC(Fe)V^a(Xe)V^(Xe)  +  E 

e=l  7=1  LA-=1/=1 

771ft  771ft  m0  m0  771ft  771ft 

-  E  E  Ka^(Xi)  -EE  K/^(X$)  +  <5Q£j  £  £  K*1  , 

fc=i;=l  fc=l/=l  fc=l/=l 


(24) 


Ul;  2 

where  1  =  .if5  is  the  Lagrangian  tangent  stiffness  tensor,  k('  =  ''  i  s  the  atomic  level  stiffness 

matrix,  and  p  =  3F  is  the  first  Piola-Kirchhoff  stress  tensor.  It  is  also  useful  to  note  that  the 
general  lack  of  symmetry  in  forces  due  to  conjoined  local/nonlocal  domains,  in  the  absence  of 
correctively  applied  ghost  forces,  also  translates  to  asymmetry  in  the  Hessian.  To  demonstrate 
this,  we  refer  to  the  following  equations: 


d£(FL)dFL 
9Fl  duL 


(25) 


fiVL  — 


^  dEp  dr[3 

(3—\  dr[3  duNL 


(26) 


It  can  be  shown  that  the  partial  derivative  9uNL  —  92V\h/d\i jsfpdui  zero  However, 

the  same  is  not  true  for  the  transpose 


dfiVL  _  d2nh  =  y  n  drp  d2Ef3  dv0  ,  (27) 

duL  duLduNL  ^^^duLdrpdrpduNL 

where  the  only  nonzero  term  in  the  summation  is  when  P  —  1.  Since,  in  practice,  n/3  ■*■,  and 

based  on  the  model  in  figure  2,  rl  —  +  ul  —  X nl  ~  UNL,  we  may  simplify  equation  18 

as  follows: 

d2nh/duLdu nl  =  ~d2E1  (r1} ...,  rn)  /dr1dr1  (28) 
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3.  Solvers  for  Unconstrained  Optimization 


In  this  section,  the  TNR  and  QNR  methods  are  presented.  The  NCG  and  the  NR  methods  are 
also  briefly  reviewed. 

In  order  to  obtain  the  equilibrium  configuration  of  the  solid,  unconstrained  minimization  of  the 
total  potential  energy  II  (d)  needs  to  be  performed.  From  the  discussions  in  the  previous 
sections,  we  have  gradient  g  (equation  23),  the  Hessian  H  (equation  24,  referred  as  G  before), 
and  approximate  inverse  (B)  of  the  Hessian.  Then  a  general  minimization  iterative  solver 
framework  for  unconstrained  minimization  can  be  described  by  algorithm  1 .  It  is  evident  from 
this  algorithm  that  if  only  the  gradient  is  available,  then  QNR  (29),  steepest  descent,  and  NCG 
(30,  31)  are  applicable.  While  the  NR  method  has  better  convergence  properties,  the 
computation  of  an  exact  H  is  time-consuming  and  may  require  large  amounts  of  storage  for 
large-scale  problems.  In  algorithm  1,  convergence  is  said  to  occur  if  ||gk||  <  s  is  achieved. 


•Algorithm  1 :  General  iterative  solver  framework  for  minimization. 
Initialization:  uo  is  given; 

Relax  Iteration: 


u*+l 


u/.  +  tty-d/-.  k  —  0.  -  ■  ,  where  dy-  for: 
Newton-Raphson:  d;  =  —  (HQ-,g; 
Quasi-Newton:  dy;  =  —  By;gy- 
Steepest-descent:  d*.  =  —  g*. 


Conjugate  Gradient:  d*  —  -gy;  -I-  (3dyt  where  (3  for: 

Fletcher-Reeves  method:  pf*  =  (j]Q;) 
Polak-Ribiere  method:  ~  ~  I'1 

(C  l-  1 .2!  1  ) 


or  I  Iestenes-Stiefel  method:  (3£/s  = 


1/ 

{&  1  -gx  1  > 

IIS  _  1) 

k  (d*-i.gt-g*-i) 


3.1  Nonlinear  Conjugate  Gradient  Method 

The  nonlinear  conjugate  gradient  method  is  of  the  following  form: 

uA-+i  =  UA-  +  ak dfc  5 


(29) 


where  ak  >0  is  the  step  length  and  dk  is  search  direction.  Normally,  the  search  direction  at  the 
first  iteration  is  computed  employing  the  steepest  descent  direction,  namely,  do  =  -go.  The  other 
search  directions  can  be  defined  recursively  as  follows: 

djt+i  =  -gfc+i  +  (30) 
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Nonlinear  conjugate  gradient  methods  use  search  directions  that  combine  the  negative  gradient 
direction  with  another  direction,  chosen  so  that  the  search  will  take  place  along  a  direction  not 
previously  explored  by  the  algorithm.  For  the  linear  problems,  the  function  minimizer  is  found 
exactly  within  just  n  iterations,  thus  performance  is  always  predictable.  For  nonlinear  problems, 
performance  is  problem  dependent,  but  these  methods  have  the  advantage  that  they  require  only 
gradient  evaluations.  Memory  requirements  are  minimal,  making  this  a  popular  class  of 
algorithms  for  large-scale  optimization.  Among  common  implementations,  such  as  the  Fletcher- 
Reeves,  Polak-Ribiere,  and  Hestenes-Stiefel  methods,  the  Polak-Ribiere  nonlinear  conjugate 
method  appears  to  have  the  best  convergence  performance  (described  in  algorithm  2). 

•Algorithm  2\  Polak-Ribiere  conjugate  gradient  with  given  tolerance  s: 

(1)  Initialize  u0;  d0  =  -go; 

(2)  FOR  (k  =  1,  •  •  •  ,kmax)  DO 

(3)  Compute  a  such  that  minn(u*_i  +  ad^_| );  /*  line  search  */ 

(4)  u*  =  uk-\  +ad*_i; 

(5)  IF  (||g*||  <  e)  THEN  exit; 

(6)  /*  Polak-Ribiere  method  */ 

(7) d*  =  -B*g*  +  p*d*_,; 

(8)  ENDFOR 

3.1.1  Line  Search  Algorithm 

Define  n(a)  =  n(u  +  ad)  where  d  and  u  are  given.  This  potential  energy  can  be  redefined  as  a 
single  variable  function  as  ^(a)  —  n(ufc  +  acbc),  which  transforms  minx  n(x)  into  the 
minimizing  problem  m'na€p,ft]  <p(Q).  Commonly  used  line-search  methods  are  the  NR  method 
(where  H k  is  needed)  and  the  backtracking  approach  (9). 

3.1.2  Backtracking 

The  step  length  control  in  the  line  search  algorithm  is  mainly  governed  by  the  Wolfe  conditions, 
which  first  requires  that  to  avoid  over-approximating  the  step  length  n(“*  +  n<k)  <  n(«fc)  +  cxa(%kdk)^ 
where  ri  —  10  4,  which  is  based  on  the  sufficient  decrease  condition.  Secondly,  to  avoid  under¬ 
approximating  the  step  length,  a  condition  is  placed  on  the  curvature  and  is  given  by 
(ff(uk  +  tt/.A/.;).  dk)  >  C2 (g/,. .  where  c2  =  0-9.  A  backtracking  line-search  method  is  described 

in  algorithm  1 . 

•Algorithm  3:  Backtracking  line  search:  given  «o  >  0  and  7i>  72  (0  <  71  <  72  <  1) 

(D*  =  0; 

(2)  While  U{uk+akdk)  >  U{uk) +c\ak(gk.dk) 

(3)  Compute  (fy+i  €  [Yi0tA-Y2«A-]  such  that  Wolfe  conditions  are  satisfied; 

(4)  k  =  k  +  1; 

(5)  ENDWhile 
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Statement  (3)  in  algorithm  3  is  implemented  via  the  bisection  method,  gold  section,  or 
polynomial  interpolation  method  (9).  In  our  study,  three  point  cubic  interpolation  methods  are 
employed  and  are  described  briefly.  Assume  4>c(a)  =  aa3  +  bcP-  +  ca  +  d  that  satisfy 
0C(O)  =  4>{ 0)  4>c(0)  =  4>'{0\  4>c(ak)  =  4>(ak- 1),  and^c(afc)  =  Then, 

0c(a)  =  aa3  +  6a2  +  4>'(Q)a  +  where 


a 

—  0 

• 

OJ-Sc 

s 

1 

<t>(<*k)  -  <K0)  -  <t>'( 0)ak 

b 

—  u 

.  -afc-l  ak  \ 

0(afc-l)-0(O)-^(O)afc_! 

(31) 


and  ^  (afc- iak(ak  ak  i)).  Using  °'/(°)  —  0  such  that  4>c(ot)  is  minimized,  it  follows  that 


ak+ 1  — 


6—  \fb^— 3a(f>/(0) 


3a 


3.2  NR  Methods 

For  efficient  and  robust  implementation  of  the  NR  methods,  the  following  issues  are  critical: 

•  To  make  the  method  converge  toward  a  local  minimizer,  the  Hessian  G(A )  has  to  be  positive 
definite  to  guarantee  the  search  direction  d ^  a  descent  direction,  i.e.,  (g(A)- d^>  <  The  NR 
method  is  not  necessarily  globally  convergent,  meaning  that  it  may  not  converge  from  any 
starting  point.  The  current  guess  u(°)  should  be  sufficiently  close  to  u*  so  that  the 
truncation  error  in  equation  A-l  of  the  appendix  is  negligible.  In  such  case,  u(fc)  will 
converge  to  u*  at  a  quadratic  rate,  i.e.,  ||u(/  +1'1  -  u  ||  <  e||u^^  -  u  ||2  where  £  is  a  positive 
constant. 

•  In  large-scale  problems,  the  computation  and  storage  requirements  of  the  Hessian  may 
become  substantial.  This  can  be  handled  by  either  using  the  diagonal  terms  in  the  Hessian, 
i.e.,  ignoring  the  cross  terms,  or  avoiding  recalculation  at  each  iteration  (can  be  done  in 
instances  of  slow  variation  of  the  second  derivative). 

•  The  backtracking  line-search  scheme  is  recommended  in  the  Newton-Raphson  algorithm, 
i.e.,  in  the  full  Newton  step.  a(-  k)  is  tried  first.  If  these  steps  fail  to  satisfy  the  criterion  for 
the  decrease  of  the  function,  backtrack  in  a  systematic  way  along  the  Newton  direction. 

The  advantage  of  doing  this  is  that  fast  convergence  of  the  NR  method  will  be  obtained  as 
the  solution  gets  closer  to  the  minimum. 

3.2.1  Numerical  Formulation  of  Hessian 

Figure  3  illustrates  several  strategies  in  computing  the  Hessian  (or  the  inverse  of  Hessian) 
(16-25),  which  are  categorized  into  three  branches  of  numerical  methods.  These  are  the  analytic 
method,  the  difference  method,  and  the  low-rank  update  method.  The  fonnulation  of  the  analytic 
(exact)  Hessian  is  given  by  equation  24.  In  this  section,  we  discuss  only  the  low-rank  update 
method. 
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Figure  3.  Numerical  methods  to  formulate  Flessian  matrix. 


3.2.2  Low-Rank  Update  Method 

A  low-rank  update  method  (10,  32,  33)  builds  up  an  approximation  to  the  Hessian  by  keeping 
track  of  the  gradient  differences  along  each  step  taken  by  the  algorithm.  Various  conditions  are 
imposed  on  the  approximate  Hessian.  For  example,  its  behavior  along  the  step  just  taken  is 
forced  to  mimic  the  behavior  of  the  exact  Hessian,  and  it  is  usually  kept  positive  definite.  Given 
B^)  as  an  approximation  to  the  Hessian  G(u(A  ));  the  low-rank  update  method  iteratively  updates 
this  matrix  by  incorporating  the  curvature  of  the  problem  measured  along  the  step  according  to 
the  following  secant  condition  (or  Quasi-Newton  condition):  b(a+i)s(a)  =  ya(  where 
y(fc)  =  g(A+1)  -  g(A).s(A)  =  u(A+1)  -  uV)  Table  1  lists  four  commonly  used  low-rank  update 
methods.  Broyden-Fletcher-Goldfarb  (BFGS)  is  generally  considered  to  be  the  most  effective 
variable  metric  method. 

In  many  applications,  an  inverse  approximate  Hessian  H  A 1  %  G  is  more  commonly  used. 
Table  2  lists  four  commonly  used  rank-2  approximate  inverse  Hessians,  which  are  formulated 
subject  to  the  following  secant  condition:  HG+UgG)  =  sG). 


3.2.3  Convergence  of  the  NR  Method  Based  on  Iterative  Solver 


Let  {u(A)}  be  a  sequence  converging  to  u*.  The  convergence  about  the  associated  gradient  is 
linear,  super-linear,  and  quadratic  if  there  exists  constants  7  £  (0,1)  and  cr  >  0  such  that 


l|gG+1)ll 

l|g(fc)|| 


<  7, 


Vfc , 


(32) 


lim  lls(fc+1)H 

||g(*>|| 


=  0, 


(33) 
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Table  1.  Approximate  Hessian  computed  from  previous  iteration  gradient  information. 


Broyden-Fletcher-Goldfarb  (BFGS) 

pf/c-Fll  .  -p(k)  I  |  y(fe)xy<» 

°  °  ■*"  (g(U,d(U)  1  (y(U,s W)  ' 

Powell-Symmetric  (PSB) 

nt -|  'i  nl'A-'l  i  (y(fc)— B^isU))  Xs(fc)+s(fc)  x  (y(fc)_B(fc)s(fc))  , 

B  B  +  (sW,sW)  + 

(y(fc)_B(fe)s(fc))xs(fc)/  ffct  w  Jk')\ 

<sW,sW)2  X  S  '  • 

Davidson-Fletcher-Powell  (DFP) 

where 

SRI  (Symmetric  Rank-1  Update  ) 

.  ■ol'fc'l  (B«S(fc))x(B(fc)S(fc))  ,  yWxyW  , 

(s(fc),  s(fc)>B(A;)(W(fc)  x  w<»), 

_  y(fc)  B(fc)s(fc) 

<y(fc),S  (*0)  (s(i0,s(t))B(fc) 

d(H11_t>(H  1  (y(fc)-BWs(fc))x(y(fe)-B(fc)sW) 

13  -  13  I"  (sW,y(*)-BWsW 

Table  2.  Rank-2  update  approximation  of  Hessian  inverse. 


BFGS 

s«s«T 

PSB 

jjtfc+l')  |  (y(fc)_H(Ds(D)xsG)-|-s(t)x(y(D_H(Ds(fc)) 

(s(«,s(*)> 

><  *“>>  • 

DFP 

(^o"yW)  y(A)  H(A)y(i)ZZT  . 

Gill  and  Murray 

H^'+b  i-  |HlkV*)j(*)T  -|-  s(*)y(fc)TH(fc^  + 

_ l _ (x  +  y(*>TH(*)y<*>\  s(i)s(fc)T  . 

and 


l|g(*+1)ll 

||g(fc)||2 


<  <7,  Vfc  . 


(34) 


Theorem  1  in  the  appendix  shows  that  the  truncated  NR  method  converges  at  a  super-linear  rate. 
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3.2.4  Implementation  of  the  NR  Method 

In  our  work,  the  NR  method  is  utilized  and  described  by  algorithm  2. 

•Algorithm  4\  Newton-Raphson  algorithm  with  given  tolerance  e: 

(1)  Initialize 

(2)  d<°>  =  -g<°> 

(3)  FOR  (ft  =  0,  •  •  •  ,  fcmax)  DO 

(4)  arg  min  n(u<«  +  m#>)  /*  line  search  */ 

(5)  ua+i  =  ua-  +  ad*; 

(6)  update  GA+iancjgA-i-i 

(7)  IF  (fell  <  e)  THEN  exit; 

(8)  Gfc+id*  =  -gA+i  /*  search  direction  */ 

(9)  ENDFOR 

Two  variants  of  NR  strategies  are:  (1)  a  standard  NR  strategy  where  the  Newton  equations  are 
solved  using  the  direct  method,  i.e.,  LU  factorization,  and  (2)  a  truncated  NR  method,  where  the 
Newton  equations  in  statement  (8)  of  algorithm  2  are  solved  approximately  by  preconditioned 
conjugate  gradient  algorithm  (PCG). 

•Algorithm  5:  A  preconditioned  conjugate  gradient  solver  for  Gd  =  g,  where  the  initial  guess  do 
about  the  solution  is  given  and  C  is  the  given  preconditioner  comprises  algorithm  3. 

(1)  ro  =  g  -  Gdo;  zo  =  Cro;  Po  —  zo 

(2)  FOR  *  —  0  ■  1,  •  •  •  till  Convergence  DO  /*  Krylov  Iteration  */ 

(3)  =  (Ti,  Zj) 

{ Api,Pi > 

(4)  di+i  —  d'  +  a«Pi  /*  update  solution  */ 

(5)  r;+i  —  ri -  aeAPi  /*  update  residual  */ 

(6)  zi+ 1  =  Gri+i/*  preconditioning  */ 

(7)  n  _  (ri+i.gi+i) 

<r*»z  i) 

(8)  P/  +  1  =  z!+l  +  ftp  i 

(9)  END  FOR 

Using  the  Hessian  matrix,  the  optimal  search  direction  can  be  obtained  in  the  NR  method. 
However,  the  0(R2)  memory  requirements  and  0(R3)  associated  with  solving  a  linear  system 
involving  a  Hessian  restricts  the  NR  methods  only  to  (1)  small-scale  problems,  (2)  problems  with 
special  sparsity  patterns,  or  (3)  near  a  solution.  As  a  remedy,  Quasi-Newton,  discrete  Newton, 
and  truncated  Newton  methods  arise.  The  principle  of  these  alternate  methods  is  to  obtain  an 
approximation  to  the  inverse  of  the  Hessian  matrix,  i.e.,  B(A  )  «  G-1(u(D)5 10  obviate  the  difficulties 
that  accompany  its  calculation  and  storage.  One  may  also  perform  a  nontrivial  port  of  the  NR 
method  onto  distributed  computer  systems  using  representative-atom-based,  domain- 
decomposition  methods.  In  light  of  these  difficulties  involving  the  Hessian,  an  out-of-core 
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truncated  NR  strategy  is  implemented  and  tested.  This  truncated  Newton  method  is  based  on  the 
idea  that  an  exact  solution  of  the  Newton  equation  at  every  step  is  difficult  and  unnecessary  and 
can  be  computationally  wasteful  in  the  framework  of  a  basic  descent  method.  The  defining 
feature  is  that  the  search  direction  d  in  the  truncated  Newton  algorithm  is  computed  using 
algorithm  6.  While  employing  conjugate  gradients  (34)  to  approximate  the  search  direction,  the 
resulting  convergence  rate  of  algorithm  6  is  strongly  dependent  on  the  condition  number  of  G^). 
Among  various  methods,  the  preconditioning  based  on  incomplete  Cholesky  factorization  (34) 
has  the  best  convergence  perfonnance.  However,  considering  the  large  dimension  of  QC 
simulations  and  truncated  nature  of  the  search  direction,  only  a  diagonal-scaling  preconditioner 
is  often  needed.  This  is  the  preconditioning  approach  employed  in  this  work. 

•Algorithm  6:  Generating  search  direction  d  in  truncated  Newton  method: 


(1) 

Initialization: 

(2) 

PO  =  0;  q0  =  0;  r0  =  -g 

(3) 

do  =  CgrQ 

(4) 

Iteration: 

(5) 

FOR  (*  —  0,  •  •  •  ,  i max)  DO 

(6) 

/*  Section(l):  negative  curvature  test : 

(7) 

IF  (<di,di>G  <  tf<di,di>)  RETURN 

(8) 

/*  Section(2):  truncation  test  */ 

(9) 

0 

II 

p 

►i 

JS 

(do  dj)  G 

(10) 

Pi+i  “  P,  + 

(11) 

ri+l  —  —  UjGd, 

(12) 

q«+l  =  ^(Pi+l’U+l  +») 

(13) 

llu+ill  <  e  RETURN 

(14) 

,  _  (Cr;+1,ri+1) 

IF  (Cr 

(15) 

df+i  =  Cjri+1  +  (3 id, 

(16) 

ENDFOR 

4.  Numerical  Experiments  and  Results 


The  defonnation  of  crystalline  materials  is  commonly  studied  based  on  lattice  mechanisms  and 
boundary  mechanisms.  In  the  lattice  mechanisms,  deformation  occurs  by  processes  taking  place 
within  the  grains.  In  the  boundary  mechanisms  (7),  deformation  occurs  by  processes  associated 
with  the  presence  of  grain  boundaries.  Grain  boundary  sliding  and  diffusional  creep  are  two 
major  processes  of  boundary  mechanisms.  The  NR,  NCG  method,  QNR,  and  TNR  methods  are 
employed  to  simulate  these  two  problems. 
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4.1  Shearing  Grain  Boundary 

The  grain  boundary  is  comprised  of  two  twins  connected  through  an  angled  domain  wall.  The 
shearing  boundary  conditions  are  applied  at  the  upper  and  lower  boundaries.  Lateral  boundaries 
are  connected  through  periodicity  conditions.  This  problem  is  merely  used  as  a  test  for 
convergence  properties  of  the  various  solution  algorithms.  A  grain  boundary  sliding  process  (7) 
employing  the  TNR  method  is  demonstrated  in  figure  4.  The  solutions  exactly  match  with  the 
results  of  the  NCG  and  NR  methods  found  on  qcmethod.com. 


Figure  4.  Sliding  of  grain  boundary:  (a)  1st  load  step,  (b)  4th  load 
step,  (c)  7th  load  step,  and  (d)  1 1th  load  step. 


Figures  5-7  compare  the  performance  of  the  NR,  NCG,  and  TNR  methods.  In  this  report,  the 
NR  method  is  based  on  LU  factorization  (direct  solver,  available  in  qcmethod.com)  and  PCG 
iterative  solver  (algorithm  3,  implemented).  A  convergence  tolerance  for  the  PCG  is  said  to  be 
reached  once  the  following  condition  is  met: 

l|r*ll/llroll  <  e  ,  (35) 

where  s  =10  ,  10  ,  1 0  ,  10  are  employed  in  our  numerical  experiments  for  TNR  method  and 
r  is  the  residual  in  the  PCG  solution  iterations.  In  our  implementation,  a  diagonal  scaling 
preconditioner  is  used  to  accelerate  the  convergence  of  PCG.  Figure  5  shows  that  the  number  of 
required  Newton  iterations  increases  with  the  increase  in  the  convergence  criteria  selected  for  the 
approximate  search  direction  solver  (PCG).  In  the  limit,  as  the  convergence  criteria  is  decreased, 
the  TNR  method  reduces  to  the  NR  method  while,  on  the  other  hand,  it  will  incur  more  inner 
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Figure  5.  Number  of  minimization  iterations  for  NCG,  NR,  and  TNR  for  shear  grain  boundary 
problem  for  10th  load  step. 


Figure  6.  CPU  time  for  NCG,  NR,  and  TNR  for  shear  grain  boundary  problem. 


Figure  7.  Time  cost  per  minimizing  iterations  for  TNR  and  NCG  and  number  of  iteration  of 
linear  conjugate  gradient  for  the  TNR  for  shear  grain  boundary  problem. 
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PCG  iterations.  Figure  6a  and  b  shows  the  CPU  time  for  the  four  methods  considered.  It  can  be 
seen  that  for  small  PCG  convergence  tolerance  (less  than  s  =  10  7)  and  larger  PCG  convergence 
tolerances  (>  s  =  1 O^1),  the  CPU  time  is  greater  than  the  NR  method.  For  tolerance  between 
10~7<  s  <  10  1 ,  the  CPU  time  is  less  than  the  NR  method.  In  all  the  cases,  the  CPU  time  is  less 
than  the  NCG  method.  The  NR  and  TNR  methods  reduce  the  number  of  nonlinear  iterations  by 
orders  of  magnitude.  Figure  7  shows  that  the  NR  method  takes  about  66  s  per  iteration  and  NCG 
only  takes  0.5-1. 4  s  per  iterations. 

4.2  Simulation  of  Nanoindentation 

In  nanoindentation,  a  nanometer  scale  indenter  is  pushed  at  constant  speed  from  a  given  height  to 
a  given  depth  into  the  material  (loading),  and  is  subsequently  retracted  to  its  original  position 
following  the  same  path  (unloading).  For  single  crystals  the  measured  load-displacement 
response,  which  shows  the  force  required  to  push  the  tip  a  certain  distance  into  the  substrate, 
shows  characteristic  discontinuities.  A  sequence  of  deformation  of  the  simulations  is  shown  in 
figure  8.  Figure  9  shows  performance  comparisons  of  the  NCG,  NR,  and  the  QNR  methods. 
From  the  figure,  it  is  clear  that  the  QNR  performance  is  almost  the  same  as  the  NCG  method. 
Figure  10  shows  the  convergence  performance  of  the  NR,  NCG,  and  TNR  methods  with  respect 


Figure  8.  Nano  indentation  problem:  (a)  15  th  load  step,  (b)  20th  load  step, 
(c)  25th  load  step,  and  (d)  30th  load  step. 
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Figure  9.  Performance  comparison  of  the  NCG,  the  NR,  and  QNR 
based  rank-2  update  Hessian  for  1  Oth  load  step  of 
nanoindentation  problem. 


Figure  10.  Convergence  by  nonlinear  iteration  for  nanoindentation:  (a)  initial  load  step;  (b)  stage  1  of  1st  load  step; 
(c)  2nd  load  step;  (d)  3rd  load  step;  and  (e)  4th  load  step;  NR-PCG  =  TNR. 
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to  various  stages  of  the  indentation  process.  It  can  be  seen  that  NCG  takes  the  largest  number  of 
iterations  followed  by  TNR,  with  s  =  10  "  and  with  e  =  10  ~  during  the  initialization  stage, 

1st  load  step  and  10th  load  step.  Further,  figure  1 1  shows  the  CPU  times  of  the  NR  and  NCG 
methods  for  various  load  stages.  It  can  seen  again  that  NCG  takes  most  CPU  time  followed  by 
the  TNR  method,  with  s  =  10  and  with  s  =  10  during  the  initialization  stage,  1st  load  step  and 
10th  load  step.  This  is  same  performance  as  in  the  case  of  shear  grain  boundary  problem 
considered  previously. 


Figure  11.  Convergence  by  timing  cost  (s)  for  nanoindentation:  (1)  initial  load  step;  (2)  stage  1  of  1st  load  step; 

(3)  stage  2  of  1st  load  step;  (4)  stage  1  of  10th  load  step;  (5)  stage  2  of  10th  load  step,  and  (6)  5th  load 
step;  NR-PCG  =  TNR. 


5.  Conclusions 


In  the  context  of  the  two  example  problems  studied  in  this  report,  we  can  reasonably  conclude 
that  the  NR  methods  are  more  efficient  than  the  NCG  methods.  However,  the  computational 
cost  incurred  for  a  full  Newton  method,  in  which  the  Hessian  is  recomputed  and  stored  at  every 
instance,  can  be  prohibitive  in  general.  A  truncated  Newton  method  may  therefore  be  employed 
with  a  careful  selection  of  the  convergence  tolerance  such  that  the  search  directions  are  predicted 
with  reasonable  accuracy.  Many  complexities  left  unconsidered  in  this  work  still  remain.  First, 
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the  nature  of  atomic  forces  and  interactions  are  highly  nonlinear  with  respect  to  atomic 
coordinates.  Thus,  in  implicitly  solving  for  quasistatic  configurations  of  the  material,  the 
performance  of  matrix  methods  that  are  founded  on  linearization  principles  heavily  depend  on 
the  condition  of  the  material,  namely  its  proximity  to  a  minimum.  Such  approaches  are  therefore 
intrinsically  unsuitable  for  capturing  the  “nonconvex”  issues  involved  in  atomistic  problems. 
Indeed,  one  can  identify  states  of  deformation  that  result  in  instantaneous  negative-definite 
configurations  of  atoms  that  violate  the  presumptions  needed  to  employ  linear  solvers.  Thus,  the 
“region  of  viability”  of  linear  solvers  can  be  rather  small  in  the  overall  energy  landscape  of 
possible  solutions,  and  methods  such  as  the  proposed  TNR  method  may  suffer  significant 
limitations  in  general  problems. 
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Appendix.  Supporting  Derivations 


A.l  Derivation  of  Newton-Raphson  (NR)  Method 

The  NR  method  is  based  on  a  quadratic  approximation  to  the  given  function  n  (u)  in  each 
minimization  iteration.  The  Taylor  expansion  yields  the  following: 

n(u<*>  +  p(k))  =  n(u(fr))  +  P(fc)Tg(fc)  +  Ip(fc)TG(fc)p(fc)  +  0(||p||3) ,  (A-l) 

where  both  gradient  and  Hessian  G^)  are  evaluated  at  u^).  By  ignoring  the  truncation 
error,  it  follows  from  equation  A- 1  that 

an(UapWP<t>)  =  +  5gWpw  +  5C«-')Tp<*>  •  (A-2) 

Mathematically,  G^)  is  always  symmetric  if  n  is  twice  continuously  differentiable  around  u.(k\ 
resulting  in 

flf1(u(t>  +  p(t>)  _  (t)  +  G(i)pW .  (A-3) 

dpW 

According  to  the  necessary  condition  of  minimization,  it  is  known  that  G is  a  symmetric 

(k)  i  (fc) 

positive  definite  (SPD)  matrix.  Assuming  that u  i  P  is  the  minimizer,  then 

an("^  +  P<‘>)=g(t)  +  GWp(t)  =  0,  (A-4) 

which  leads  to  the  following  Newton  equation: 

G(fc)p(fc)  =  _g(fc) .  (A- 5) 

As  a  result,  NR  is  obtained  by  defining  the  search  direction  as  =  P(A  )  =  -G(A)  g^'^, 

and  the  step  length  a ^  is  set  to  be  1.0  by  default  due  to  the  existence  of  truncation  error  in 
equation  A-l.  A  line  search  is  still  required  to  find  an  appropriate  a^k\ 

A.2  Difference  Hessian 

The  difference  Hessian  method  is  evaluated  for  a  large  scale  three-dimensional  situation  in  the 
companion  report,  which  approximates  the  elements  of  the  Hessian  using  a  finite  differencing 
technique,  and  is  motivated  by  Taylor’s  theorem.  When  the  second  derivatives  of  objective 
function  fl  exists  and  are  Lipschitz  continuous,  Taylor’s  theorem  implies  the  following: 
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vn(u  +  P)  =  vn(u)  +  v2n(u).P  +  -P.v3n(u).P  +  +o(||P||3) 


where  is  scalar  product.  By  substituting 


P  =  rei 


it  follows  that 


rV2IT(u)ej  =  VIT(u  +  re,;)  —  VTI(u)  — — ei.V3Il(u).ei  +  C9(r3). 


(A-6) 

(A-7) 

(A-8) 


Similarly,  let  P  —  —  rei  where  r  >  0  be  a  small  increment  such  that  we  obtain  the  following: 


rV2n(u)e,;  -  -vn(u  -  rei)  +  VII(u)  +  —  ei:.V3n(u).ei  +  0{t3)  . 


(A-9) 


According  to  equations  A-8  and  A-9,  an  approximate  Hessian  can  be  formulated  using  the 
following  central  difference  method: 

G..  -  V*n(u)e,  -  g(u“’  +  Tei)  -  g("“'  -  7ei)  +  0(t2).  (A- 10) 

It 


Let  B  be  the  approximation  of  G  by  ignoring  the  truncation  error,  i.e., 

g (U(fc)  +  TQi)  -  g(u(fc)  -  Te,;) 


Bjk  O  — 


2  T 


(A-l  1) 


Then  the  difference  Hessian  method  is  easily  formulated  and  suitable  for  parallel  computing 
(each  element  of  the  Hessian  is  formulated  independently).  In  addition,  it  only  requires  the 
numerical  evaluation  of  the  gradient  g. 

Theorem  T.  Convergence  Rate  of  TNR:  Assume  that 

1.  =  Vn(uW)  is  continuously  differentiable  in  a  neighborhood  of  u*,  a  local  minimizer 
oflT 

2.  V2n(u*)  is  nonsingular  and  that  V2I~I  is  Lipschitz  continuous  at  u*.  Then  the  truncated 
NR  method  (see  references  16-25  in  section  6  of  this  report),  based  on  an  iterative  solver, 
has  a  super-linear  convergence  rate. 

3.  The  truncated  NR  method  is  employed  to  find  u*,  i.e.,  the  search  direction  d is  obtained 
by  solving  the  following  Newton  equation: 

G^fe)d(fc)  =  -g(fe)  (A- 12) 


with  the  following  convergence  criterion: 


„(fc)i 


<  £||g 


(fc)| 


(A- 13) 
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where  £k  —  £max  <  1  is  the  accuracy  tolerance,  then 


||g(fe+1)|| 

J'm  SUP  M  (fc)ll  <  £  <  1  ’ 


(A- 14) 


Proof:  Let  be  SPD  and  assume  there  exists  a  constant  cr  >  1  for  all  that  is  sufficiently 
close  to  u*>  II  <  Thus,  it  follows  from  equations  A- 12  and  A- 13  that 

(A- 15) 


dp}  <  a(||g^||  +  IhPlI)  <  (1  +  £)a||g^. 


According  to  the  Taylor’s  expansion  about  the  gradient  g^-*-1^  —  g^  +  it  follows  that 


g(fc+l)  =  g(fc)  +  G{k)d (fc)  +  0(||dP||2)  =  rf }  +  (P((l  +  £)2a2||gW||2  . 

Thus,  using  equation  A- 13,  it  follows  that 

||g(fc+1)||  <  e||g(fc)||  +  0(||g(fe)||2)  . 

As  a  result,  if  u is  sufficiently  close  to  u*  then  equation  A-14  holds. 


(A- 16) 


(A- 17) 
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